Most signals in life are continuous: pressure waves propogating through air, chemical reactions, body movement. For computers to process these continuous signals, however, they must be converted to digital representations via a Analog-to-Digital Converter (ADC). One major way in which a digital signal is different from its continous counterpart is that it is sampled at specific time steps. For example, sound is often sampled at 44.1 kHz (or once every 0.023 milliseconds) and an accelerometer is often sampled at 100 Hz (once every 0.01 seconds).
In this example, we will use audio data is our primary signal. Sound is a wonderful medium for learning because we can hear the signal. Recall that a microphone responds to air pressure waves. We suggest plugging in your headphones, so you can really hear the distinctions in the various audio samples.
This notebook requires LibROSA—a python package for music and audio analysis. To install this package, you have two options.
First, from within Notebook, you can execute the following two lines within a cell (you'll only need to run this once):
import sys
!{sys.executable} -m pip install librosa
Second, from within your Anaconda shell:
> conda install -c conda-forge librosa
This Notebook was designed and written by Professor Jon E. Froehlich at the University of Washington along with feedback from students. It is made available freely online as an open educational resource at the teaching website: https://makeabilitylab.github.io/physcomp/.
The notebook code is open source using the MIT license.
import librosa
import librosa.display
import IPython.display as ipd
import matplotlib.pyplot as plt # matplot lib is the premiere plotting lib for Python: https://matplotlib.org/
import numpy as np # numpy is the premiere signal handling library for Python: http://www.numpy.org/
import scipy as sp # for signal processing
import scipy.io # for signal processing
import scipy.io.wavfile # for signal processing
from scipy import signal
import random

A key factor in digitizing a signal is the rate at which the analog signal is sampled (or captured). How often must you sample a signal to perfectly reconstruct it?
The answer may surprise you and involves one of the most fundamental (and interesting) theorems in signal processing: the Nyquist Sampling Theorem, which states that a continuous signal can be reconstructed as long as there are two samples per period for the highest frequency component in the underlying signal.
That is, for a perfect reconstruction, our digital sampling frequency $f_s$ must be at least twice as fast as the fastest frequency in our continuous signal: $f_s = 2 * max(analog_{freq})$.
For example, imagine we have an analog signal composed of frequencies between 0 and 2,000 Hz. To properly digitize this signal, we must sample at $2 * 2,000Hz$. So, $f_s$ needs to be 4,000Hz.
Now imagine that the fastest our digitizer can sample is 6,000 Hz: what frequency range can we properly capture? Since we need a minimum of two samples per period for proper reconstruction, we can only signals that change with a frequency of 0 to a maximum of 3,000Hz. This 3,000Hz limit is called the Nyquist rate or Nyquist limit: it is $\frac{1}{2}$ the sampling rate $f_s$.
For many applications related to Human-Computer Interaction and Ubiquitous Computing, sampling at 4kHz is more than sufficient. This enables analysis of any signal between 0-2kHz. Human motion—ambulatory movement, limb motion, finger gestures, etc.—simply does not change that fast. Even electroencephalograms (EEG), which measure electrical activity in the brain, are often sampled at 500-1000Hz. However, for recording sound (humans can hear between ~0-20kHz), faster sampling rates are necessary.
What happens if we sample a signal with frequency components greater than the Nyquist limit ($> \frac{1}{2} * f_s$)? We get aliasing—a problem where the higher frequency components of a signal (those greater than the Nyquist limit) appear as lower frequency components. As Smith notes, "just as a criminal might take on an asumed named or identity (an alias), the sinusoid assumes another frequency that is not its own." And perhaps more nefariously, there is nothing in the sampled data to suggest that aliasing has occured: "the sine wave has hidden its true identity completely". See figure below.

Let's look at an example. Here, we'll sample four signals at a sampling rate of 50Hz: signal1 = 5Hz, signal2 = 10Hz, signal3 = 20Hz, and signal4 = 60Hz. All but signal4 are under our Nyquist limit, which is $\frac{1}{2} * 50Hz = 25Hz$. What will happen?
The "samples" are shown as vertical lines with square rectangle markers. What do you observe? Pay close attention to signal4...
def create_sine_wave(freq, sampling_rate, total_time_in_secs = None, return_time = False):
'''Creates a sine wave with the given frequency, sampling rate, and length'''
# if the total time in secs is None, then return one period of the wave
if total_time_in_secs is None:
total_time_in_secs = 1 / freq
# Create an array from 0 to total_time_in_secs * sampling_rate (and then divide by sampling
# rate to get each time_step)
time = np.arange(total_time_in_secs * sampling_rate) / sampling_rate
# Could also generate this signal by:
# time = np.linspace(0, total_time_in_secs, int(total_time_in_secs * sampling_rate), endpoint=False)
sine_wave = np.sin(2 * np.pi * freq * time)
# or, once the sample is made:
# time = np.linspace(0, len(s) / sampling_rate, num=len(s))
if return_time is False:
return sine_wave
else:
return (time, sine_wave)
def plot_sampling_demonstration(total_time_in_secs, real_world_freqs, real_world_continuous_speed = 10000, resample_factor = 200):
'''Used to demonstrate digital sampling and uses stem plots to show where samples taken'''
num_charts = len(real_world_freqs)
fig_height = num_charts * 3.25
fig, axes = plt.subplots(num_charts, 1, figsize=(15, fig_height))
time = None
i = 0
sampling_rate = real_world_continuous_speed / resample_factor
print(f"Sampling rate: {sampling_rate} Hz")
for real_world_freq in real_world_freqs:
time, real_world_signal = create_sine_wave(real_world_freq, real_world_continuous_speed,
total_time_in_secs, return_time = True)
sampled_time = time[::resample_factor]
sampled_signal = real_world_signal[::resample_factor]
axes[i].plot(time, real_world_signal)
axes[i].axhline(0, color="gray", linestyle="-", linewidth=0.5)
axes[i].plot(sampled_time, sampled_signal, linestyle='None', alpha=0.8, marker='s', color='black')
axes[i].vlines(sampled_time, ymin=0, ymax=sampled_signal, linestyle='-.', alpha=0.8, color='black')
axes[i].set_ylabel("Amplitude")
axes[i].set_xlabel("Time (secs)")
axes[i].set_title(f"{real_world_freq}Hz signal sampled at {sampling_rate}Hz")
i += 1
fig.tight_layout(pad = 3.0)
total_time_in_secs = 0.2
# Create our "real-world" continuous signals (which is obviously not possible on a digital computer, so we fake it)
real_world_continuous_speed = 10000
real_world_signal1_freq = 5
time, real_world_signal1 = create_sine_wave(real_world_signal1_freq, real_world_continuous_speed,
total_time_in_secs, return_time = True)
real_world_signal2_freq = 10
real_world_signal2 = create_sine_wave(real_world_signal2_freq, real_world_continuous_speed,
total_time_in_secs)
real_world_signal3_freq = 20
real_world_signal3 = create_sine_wave(real_world_signal3_freq, real_world_continuous_speed,
total_time_in_secs)
real_world_signal4_freq = 60
real_world_signal4 = create_sine_wave(real_world_signal4_freq, real_world_continuous_speed,
total_time_in_secs)
# Create the sampled versions of these continuous signals
resample_factor = 200 # should be an integer
sampled_time = time[::resample_factor]
sampled_signal1 = real_world_signal1[::resample_factor]
sampled_signal2 = real_world_signal2[::resample_factor]
sampled_signal3 = real_world_signal3[::resample_factor]
sampled_signal4 = real_world_signal4[::resample_factor]
sampling_rate = real_world_continuous_speed / resample_factor
print(f"Sampling rate: {sampling_rate} Hz")
# Visualize the sampled versions
fig, axes = plt.subplots(4, 1, figsize=(15,13))
axes[0].plot(time, real_world_signal1)
axes[0].axhline(0, color="gray", linestyle="-", linewidth=0.5)
axes[0].plot(sampled_time, sampled_signal1, linestyle='None', alpha=0.8, marker='s', color='black')
axes[0].vlines(sampled_time, ymin=0, ymax=sampled_signal1, linestyle='-.', alpha=0.8, color='black')
axes[0].set_ylabel("Amplitude")
axes[0].set_xlabel("Time (secs)")
axes[0].set_title(f"{real_world_signal1_freq}Hz signal sampled at {sampling_rate}Hz")
axes[1].plot(time, real_world_signal2)
axes[1].axhline(0, color="gray", linestyle="-", linewidth=0.5)
axes[1].plot(sampled_time, sampled_signal2, linestyle='None', alpha=0.8, marker='s', color='black')
axes[1].vlines(sampled_time, ymin=0, ymax=sampled_signal2, linestyle='-.', alpha=0.8, color='black')
axes[1].set_ylabel("Amplitude")
axes[1].set_xlabel("Time (secs)")
axes[1].set_title(f"{real_world_signal2_freq}Hz signal sampled at {sampling_rate}Hz")
axes[2].plot(time, real_world_signal3)
axes[2].axhline(0, color="gray", linestyle="-", linewidth=0.5)
axes[2].plot(sampled_time, sampled_signal3, linestyle='None', alpha=0.8, marker='s', color='black')
axes[2].vlines(sampled_time, ymin=0, ymax=sampled_signal3, linestyle='-.', alpha=0.8, color='black')
axes[2].set_ylabel("Amplitude")
axes[2].set_xlabel("Time (secs)")
axes[2].set_title(f"{real_world_signal3_freq}Hz signal sampled at {sampling_rate}Hz")
axes[3].plot(time, real_world_signal4)
axes[3].axhline(0, color="gray", linestyle="-", linewidth=0.5)
axes[3].plot(sampled_time, sampled_signal4, linestyle='None', alpha=0.8, marker='s', color='black')
axes[3].vlines(sampled_time, ymin=0, ymax=sampled_signal4, linestyle='-.', alpha=0.8, color='black')
axes[3].set_ylabel("Amplitude")
axes[3].set_xlabel("Time (secs)")
axes[3].set_title(f"{real_world_signal4_freq}Hz signal sampled at {sampling_rate}Hz")
fig.tight_layout(pad = 3.0)
Sampling rate: 50.0 Hz
Let's take a closer look at signal2 = 10Hz and signal4 = 60Hz. To make it easier to see the sampled signal, we'll lighten the underlying continuous (real-world) signal in blue.
fig, axes = plt.subplots(2, 1, figsize=(15,7))
axes[0].plot(time, real_world_signal2, alpha=0.5)
axes[0].axhline(0, color="gray", linestyle="-", linewidth=0.5)
axes[0].plot(sampled_time, sampled_signal2, linestyle='None', alpha=0.8, marker='s', color='black')
axes[0].vlines(sampled_time, ymin=0, ymax=sampled_signal2, linestyle='-.', alpha=0.8, color='black')
axes[0].set_ylabel("Amplitude")
axes[0].set_xlabel("Time (secs)")
axes[0].set_title(f"{real_world_signal2_freq}Hz signal sampled at {sampling_rate}Hz")
axes[1].plot(time, real_world_signal4, alpha=0.5)
axes[1].axhline(0, color="gray", linestyle="-", linewidth=0.5)
axes[1].plot(sampled_time, sampled_signal4, linestyle='None', alpha=0.8, marker='s', color='black')
axes[1].vlines(sampled_time, ymin=0, ymax=sampled_signal4, linestyle='-.', alpha=0.8, color='black')
axes[1].set_ylabel("Amplitude")
axes[1].set_xlabel("Time (secs)")
axes[1].set_title(f"{real_world_signal4_freq}Hz signal sampled at {sampling_rate}Hz")
fig.tight_layout(pad = 3.0)
Can you see it? The 60Hz signal is being aliased as 10Hz. And once the signal is digitized (as it is here), there would be no way to tell the difference between an actual 10Hz signal and an aliased one!
Why?
Look at the graphs, at the first sample, both sinusoids are just beginning; however, by the next sample, the 60Hz sinusoid has almost completed one full period!
Given the aliasing formula, with a 50Hz sampling rate, 40Hz, 60Hz, 90Hz, 110Hz, 140Hz... will all be aliased to 10Hz. However, note that 40Hz and 60Hz (and 90Hz and 110Hz, and so on) will be aliased to 10Hz but with a phase shift of one-half period.
Similarly, 70Hz, 80Hz, 120Hz, and 130Hz will all be alised to 20Hz and 50Hz, 100Hz, 150Hz, etc. will all be aliased to zero.
Let's check it out:
# feel free to change these values to see other patterns
real_world_freqs = [10, 40, 60, 90, 110, 140]
plot_sampling_demonstration(total_time_in_secs, real_world_freqs)
Sampling rate: 50.0 Hz
Let's keep experimenting with the Nyquist limit and aliasing but this time with sound data. Sound is a bit harder to visualize than our synthetic signals above because it's very high frequency (comparatively) but we'll provide zoomed insets to help.
We will also be using spectrogram visualizations to help us investigate the effect of lower sampling rates on the signal. A spectrogram plots the frequency components of our signal over time.
Let's start with a frequency sweep from 0 to 22,050Hz over the course of 30 sec period.
sampling_rate, freq_sweep_44100 = sp.io.wavfile.read('data/audio/FrequencySweep_0-22050Hz_30secs.wav')
#sampling_rate, audio_data_44100 = sp.io.wavfile.read('data/audio/greenday.wav')
quantization_bits = 16
print(f"{quantization_bits}-bit audio") # we assume 16 bit audio
print(f"Sampling rate: {sampling_rate} Hz")
print(f"Number of channels = {len(freq_sweep_44100.shape)}")
print(f"Total samples: {freq_sweep_44100.shape[0]}")
if len(freq_sweep_44100.shape) == 2:
# convert to mono
print("Converting stereo audio file to mono")
freq_sweep_44100 = freq_sweep_44100.sum(axis=1) / 2
# Set a zoom area (a bit hard to see but highlighted in red in spectrogram)
xlim_zoom = (11500, 13500)
ipd.Audio(freq_sweep_44100, rate=sampling_rate)
16-bit audio Sampling rate: 44100 Hz Number of channels = 1 Total samples: 1323000
Now, imagine that we captured this frequency sweep with a $f_s$ of 11,025Hz. What will the sweep sound like now? What's the maximum frequency that we can capture with $f_s = 11,025Hz$?
Well, from the Nyquist limit, we know that with $f_s$, the maximum capturable frequency is $\frac{1}{2} * f_s$. Thus, the maximum frequency that we can capture is $\frac{1}{2} * 11,025 Hz = 5512.5 Hz$. Recall, however, that the underlying audio signal contains frequencies from 0 to 22,050Hz. So, what will happen with frequencies between 5,512.5 - 22,050Hz in our signal?
Yes, aliasing strikes again. Those higher frequency signals will be aliased to lower frequencies.
Let's see the problem in action below.
resample_factor = 4
new_sampling_rate = int(sampling_rate / resample_factor)
freq_sweep_11025 = freq_sweep_44100[::resample_factor]
resample_xlim_zoom = (xlim_zoom[0] / resample_factor, xlim_zoom[1] / resample_factor)
# plot_waveform(audio_data_11025, new_sampling_rate, quantization_bits, xlim_zoom = resample_xlim_zoom)
print(f"Sampling rate: {sampling_rate} Hz with Nyquist limit {int(sampling_rate / 2)} Hz")
print(f"New sampling rate: {new_sampling_rate} Hz with Nyquist limit {int(new_sampling_rate / 2)} Hz")
ipd.Audio(freq_sweep_11025, rate=new_sampling_rate)
Sampling rate: 44100 Hz with Nyquist limit 22050 Hz New sampling rate: 11025 Hz with Nyquist limit 5512 Hz
With a sampling rate of 11,025Hz, we see aliasing occur when the frequency sweep hits 5,512.5 Hz.
Let's play a chord composed of the following frequencies: 261.626Hz, 293.665Hz, 391.995Hz, 2093Hz, and 4186.01Hz.
Given that the highest frequency in this signal is 4,186.01Hz, we need a minimum sampling rate of $2 * 4186.01Hz = 8372.02Hz$ to capture the highest frequency and $2 * 2093Hz = 4,186Hz$ to capture the next highest frequency.
The original sampling was at 44,100Hz, which is more than sufficient to capture the sound signal.
# Equation used to produce the audio file
# 0.2 * sin(2*pi*261.626*t) + 0.2 * sin(2*pi*293.665*t) + 0.2 * sin(2*pi*391.995*t) + 0.2 * sin(2*pi*2093*t) + 0.2 * sin(2*pi*4186.01*t)
sinewavechord_soundfile = 'data/audio/SineWaveChord_MinFreq261_MaxFreq4186_2secs.wav'
sampling_rate, sine_wave_chord_44100 = sp.io.wavfile.read(sinewavechord_soundfile)
#sampling_rate, audio_data_44100 = sp.io.wavfile.read('data/audio/greenday.wav')
print(f"Sampling rate: {sampling_rate} Hz")
print(f"Number of channels = {len(sine_wave_chord_44100.shape)}")
print(f"Total samples: {sine_wave_chord_44100.shape[0]}")
if len(sine_wave_chord_44100.shape) == 2:
# convert to mono
print("Converting stereo audio file to mono")
sine_wave_chord_44100 = sine_wave_chord_44100.sum(axis=1) / 2
quantization_bits = 16
xlim_zoom = (10000, 11000)
ipd.Audio(sine_wave_chord_44100, rate=sampling_rate)
Sampling rate: 44100 Hz Number of channels = 1 Total samples: 88200
At a sampling rate of 11,025Hz, the Nyquist limit (5,512Hz) is still above the maximum frequency in our signal (4186.01 Hz), so the audio should sound the exact same as it did for the original 44,100 Hz sample above (and no aliasing will occur).
resample_factor = 4
new_sampling_rate = int(sampling_rate / resample_factor)
sine_wave_chord_11025 = sine_wave_chord_44100[::resample_factor]
resample_xlim_zoom = (xlim_zoom[0] / resample_factor, xlim_zoom[1] / resample_factor)
# plot_waveform(audio_data_11025, new_sampling_rate, quantization_bits, xlim_zoom = resample_xlim_zoom)
print(f"Sampling rate: {sampling_rate} Hz with Nyquist limit {int(sampling_rate / 2)} Hz")
print(f"New sampling rate: {new_sampling_rate} Hz with Nyquist limit {int(new_sampling_rate / 2)} Hz")
ipd.Audio(sine_wave_chord_11025, rate=new_sampling_rate)
Sampling rate: 44100 Hz with Nyquist limit 22050 Hz New sampling rate: 11025 Hz with Nyquist limit 5512 Hz
What about if the sampling rate $f_s$ is 4,410Hz? Then the Nyquist limit is 2,205Hz, which is below the 4186.01 Hz signal.
Recall our aliased frequency formula: so what will 4186.01Hz show up as in our sampled signal? It will be aliased as 223.99Hz.
Let's check to see what happens.
resample_factor = 10
new_sampling_rate = int(sampling_rate / resample_factor)
sine_wave_chord_4410 = sine_wave_chord_44100[::resample_factor]
resample_xlim_zoom = (xlim_zoom[0] / resample_factor, xlim_zoom[1] / resample_factor)
print(f"Sampling rate: {sampling_rate} Hz with Nyquist limit {int(sampling_rate / 2)} Hz")
print(f"New sampling rate: {new_sampling_rate} Hz with Nyquist limit {int(new_sampling_rate / 2)} Hz")
ipd.Audio(sine_wave_chord_4410, rate=new_sampling_rate)
Sampling rate: 44100 Hz with Nyquist limit 22050 Hz New sampling rate: 4410 Hz with Nyquist limit 2205 Hz
Below, we downsample a 44.1 kHz human voice to: 22.5kHz, 11,025Hz ... 441 Hz. For each downsampling, we visualize the original 44.1 kHz waveform as well as its downsampled counterpart.
# feel free to change this to some other 44.1kHz sound file
sound_file = 'data/audio/HumanVoice-Hello_16bit_44.1kHz_mono.wav'
sampling_rate, audio_data_44100 = sp.io.wavfile.read(sound_file)
print(f"Sampling rate: {sampling_rate} Hz")
print(f"Number of channels = {len(audio_data_44100.shape)}")
print(f"Total samples: {audio_data_44100.shape[0]}")
if len(audio_data_44100.shape) == 2:
# convert to mono
print("Converting stereo audio file to mono")
audio_data_44100 = audio_data_44100.sum(axis=1) / 2
quantization_bits = 16
xlim_zoom = (11500, 12500)
ipd.Audio(audio_data_44100, rate=sampling_rate)
Sampling rate: 44100 Hz Number of channels = 1 Total samples: 30833
resample_factor = 2
new_sampling_rate = int(sampling_rate / resample_factor)
audio_data_22500 = audio_data_44100[::resample_factor]
print(f"New sampling rate: {new_sampling_rate} Hz")
resample_xlim_zoom = (xlim_zoom[0] / resample_factor, xlim_zoom[1] / resample_factor)
print("resample_xlim_zoom", resample_xlim_zoom)
#plot_audio(audio_data_22500, new_sampling_rate, quantization_bits, xlim_zoom = resample_xlim_zoom)
ipd.Audio(audio_data_22500, rate=new_sampling_rate)
ipd.Audio(audio_data_22500, rate=new_sampling_rate)
New sampling rate: 22050 Hz resample_xlim_zoom (5750.0, 6250.0)
resample_factor = 4
new_sampling_rate = int(sampling_rate / resample_factor)
audio_data_11025 = audio_data_44100[::resample_factor]
resample_xlim_zoom = (xlim_zoom[0] / resample_factor, xlim_zoom[1] / resample_factor)
# plot_waveform(audio_data_11025, new_sampling_rate, quantization_bits, xlim_zoom = resample_xlim_zoom)
print(f"Sampling rate: {sampling_rate} Hz with Nyquist limit {int(sampling_rate / 2)} Hz")
print(f"New sampling rate: {new_sampling_rate} Hz with Nyquist limit {int(new_sampling_rate / 2)} Hz")
ipd.Audio(audio_data_11025, rate=new_sampling_rate)
Sampling rate: 44100 Hz with Nyquist limit 22050 Hz New sampling rate: 11025 Hz with Nyquist limit 5512 Hz
resample_factor = 10
new_sampling_rate = sampling_rate / resample_factor
audio_data_4410 = audio_data_44100[::resample_factor]
resample_xlim_zoom = (xlim_zoom[0] / resample_factor, xlim_zoom[1] / resample_factor)
#plot_waveform(audio_data_4410, new_sampling_rate, quantization_bits, xlim_zoom = resample_xlim_zoom)
ipd.Audio(audio_data_4410, rate=new_sampling_rate)